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SUMMARY 


Sequences encoding the surface projection glycoprotein of the coronavirus, murine 
hepatitis virus (MHV), strain JHM, have been cloned into pAT153 using cDNA 
produced by priming with specific oligonucleotides on infected cell RNA. The regions 
of three clones pJMS1010, pJS112 and pJS92, which together encompass the surface 
protein gene have been sequenced by the chain termination method. The sequence of 
the primary translation product, deduced from the DNA sequence, predicts a 
polypeptide of 1235 amino acids with a molecular weight of 136600. This polypeptide 
displays the features characteristic of a group 1 membrane protein; an amino-terminal 
signal sequence and carboxy-terminal membrane and cytoplasmic domains. There are 
21 potential glycosylation sites in the polypeptide and a cysteine-rich region in the 
vicinity of the transmembrane domain. During maturation proteolytic processing of 
the polypeptide occurs and at positions 624 to 628 the sequence Arg—Arg—Ala—Arg- 
Arg is found, which is similar to a number of basic sequences involved in the cleavage 
of enveloped RNA virus glycoproteins. The fusogenic properties of the MHV surface 
protein do not appear to correlate with a strongly hydrophobic region at the putative 
amino terminus of the carboxy-terminal cleavage product. 


INTRODUCTION 


Coronaviruses are pleomorphic, enveloped viruses which replicate in the cytoplasm of 
vertebrate cells and are associated with diseases of economic importance (Siddell et al., 1983). 
In the laboratory, the murine hepatitis virus (MHV) has been extensively used for the study of 
viral pathogenesis, in particular as a component of a model for demyelinating diseases of man 
(Knobler et a/., 1982; Watanabe et al., 1983; Massa et al., 1986). The MHV genome is a 
monopartite, positive-stranded RNA of approximately 18 kb. The genome encodes the 
nucleocapsid (N), membrane (M or E1) and surface (S or E2) proteins of the virion, as well as 
several non-structural proteins (Sturman & Holmes, 1983; Siddell et a/., 1983). 

The MHYV S protein is synthesized on membrane-bound ribosomes as a co-translationally N- 
glycosylated polypeptide with an apparent mol. wt. of 150000 (Niemann et al., 1982; Holmes et 
al., 1981; Siddell et a/., 1981). The polypeptides synthesized in vitro or in tunicamycin-treated 
cells have mol. wt. of approximately 120000 (Rottier et a/., 1981; Siddell, 1983). During 
transport within the cell, oligosaccharides are trimmed and terminal sugars are added, resulting 
in a 180000 mol. wt. S polypeptide. Shortly before, or at the time of, virus release a proportion of 
S is cleaved into two approximately 90000 mol. wt. polypeptides, S, and S, (Niemann et ai., 
1982; Sturman et a/., 1985). S, and S, (which are also referred to as 90B and 90A; Sturman et al., 
1985) cannot be distinguished by SDS-PAGE but can be separated by hydroxyapatite 
chromatography. It has been shown that S, is acylated (Ricard & Sturman, 1985). The cleavage 
of the S polypeptide is a host cell-dependent event (Frana et a/., 1985) and activates its cell-fusing 
ability. The S protein is also responsible for the attachment/infectivity of the MHV virion and 
some monoclonal hybridoma antibodies which react with the S protein are able to mediate virus 
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neutralization in vitro and passively protect mice against lethal virus challenge in vivo (Collins et 
al., 1982). 

The organization and expression of the MHV genome has been studied in detail (for reviews, 
see Holmes, 1985; Siddell, 1986) (Fig. 1). Briefly, in MHV-infected cells six subgenomic 
mRNAs, as well as genome-sized RNA, are produced. These mRNAs form a 3’ co-terminal 
nested set and each also has acommon 5’ leader sequence of about 70 bases (Lai et al., 1984). The 
available evidence suggests that only the information contained within the ‘unique’ sequences at 
the 5’ end of each mRNA (i.e. those absent from the next smallest RNA) is translated into 
protein (Siddell, 1986). The translation of size-fractionated MHV mRNAs has shown that 
subgenomic mRNA 3 encodes the S protein (Siddell, 1983). 

Previously, we have isolated cDNA clones containing overlapping viral inserts which 
encompass approximately 4-6 kb at the 3’ end of the MHV-JHM genome (Skinner & Siddell, 
1983, 1985; Skinner er a/., 1985; Pfleiderer et al/., 1986). Using specific oligonucleotide primers 
we have now isolated two further clones which contain inserts extending to the 5’ end of mRNA 
3. The regions of the three clones which together contain the MHV S gene (Fig. 1) have been 
completely sequenced on both strands. This sequence, together with the predicted amino acid 
sequence of the S gene product is presented in this paper. 


METHODS 

cDNA cloning. The isolation and characterizaton of the plasmid pJMS1010 has been previously described 
(Skinner et al., 1985). The growth of Sac(—) cells, the propagation of MHV-JHM stocks and the isolation of 
polyadenylated RNA from MHV-JHM-infected cells have also been described (Siddell et a/., 1980). The 
oligonucleotide primers A (3° GTCGACGACCACACGG 9’), B 3’ GTGTGGGACATTCGGAT 5’) and others 
were synthesized using the phosphoramidite method on an Applied Biosystems 380 A DNA Synthesizer. cDNA 
synthesis was carried out using the method of Gubler & Hoffman (1983) with slight modifications. In particular, 
prior to trailing the double-stranded cDNA with dC residues, potential RNA overhangs were removed by 
treatment with DNase-free R Nase A (10 g/ml) for 8 min at 37 °C. The tailed ds cDNA was cloned into dG-tailed 
PstI-cleaved pAT153. This material was used to transform Escherichia coli DH1 and selection was made for 
tetracycline resistance. Clones containing viral inserts were identified by colony hybridization using 
polynucleotide kinase >?P-labelled, cDNA synthesis primer as probe. The size of viral inserts in plasmids from 
hybridizing clones was determined by gel electrophoresis of Pstl-cleaved DNA. An oligonucleotide (3’ 
GCATGCCTGCGGTTAG 5’), which corresponds to a region near the 5’ end of the MHV-JHM mRNA leader 
(Skinner & Siddell, 1983) was used in hybridizations to identify the plasmid pJS92. 

Subcloning in M13. Fragments of the viral inserts contained within pJMS1010, pJS112 and pJS92 were 
generated by a variety of restriction enzymes and were cloned either as mixtures or as single fragments (purified by 
electroelution from either agarose or acrylamide gel) into the M13 vectors mp8, mp9, mp18 and mp19. Where 
necessary, specific clones were identified by hybridization to single-stranded DNA probes generated from 
characterized M13 clones (O’Hare et al., 1983). 

DNA sequencing. M13 dideoxynucleotide sequencing was carried out using [a-?°S]dATP. The complete 
sequence was obtained on both strands. To complete the project oligonucleotides complementary to specific MHV 
sequences were synthesized and used to prime the sequencing reactions. Sequence data were analysed and 
assembled using the programs of Staden (1982a). 

Southern/Northern analysis. Northern blot analysis of RNA following electrophoresis in 1% agarose— 
formaldehyde gels, Southern blot analyses of DNA and nick translations were performed according to Maniatis et 
al. (1982). 


RESULTS 


The position of the MHV sequences contained within the plasmid pJMS1010 has been 
previously determined by Northern blot and sequence analysis (Skinner et al., 1985). The viral 
insert within the plasmid extends from within the M protein gene (which is translated from 
mRNA 6) to a position approximately 2:6 kb from the 5’ end of mRNA 3 (Fig. 1). A 16-base 
oligonucleotide, primer A, complementary to a sequence towards the 5’ end of the pJMS1010 
insert was used to prime cDNA synthesis from infected cell poly(A)-containing RNA. Plasmid 
pJS112 obtained from this experiment contained a 2-2 kb insert which hybridized in Northern 
blots to the MHV mRNAs 3, 2 and 1 (data not shown). Sequence analysis confirmed that the 3’ 
end of the insert corresponded to the cDNA synthesis primer. In a second cDNA synthesis 
experiment a 17-base oligonucleotide, primer B, complementary to a sequence towards the 5’ 
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Fig. 1. Genomic organization of murine hepatitis virus. The relationship between the 3’ co-terminal 
nested set of mRNAs and the viral genome is shown, together with the coding regions for the structural 
proteins, nucleocapsid (N), membrane (M) and surface (S), specified by mRNAs 7, 6 and 3 respectively. 


The arrangement of the cDNA clones and the positions of the primers used are shown. The sequences 
presented in Fig. 2 are represented by the hatched box. 


end of the pJS112 insert was used to obtain the plasmid pJS92. pJS92 contained an insert of 530 
bp and hybridized in Northern blot analysis to all viral mRNAs. Hybridization of the pJS92 
insert to the cDNA synthesis primer and to a primer corresponding to the MHV-JHM leader 
sequences (see Methods) was confirmed by Southern blot analysis of PstI-cleaved pJS92 DNA 
(data not shown). 

A 3780-base sequence containing the gene encoding the MHV-JHM S propolypeptide (i.e. the 
predicted primary translation product) is presented in Fig. 2. Immediately preceding the AUG 
initiation codon is the sequence UCUAAAC. This sequence is identical to genomic sequences 
preceding the known or presumed S’ initiation codons of mRNAs 7, 5 and 4 and differs by only 
one base from the sequence UCCAAAC, preceding the initiation codon of mRNA 6 (Skinner et 
al., 1985; Skinner & Siddell, 1985; Pfleiderer et a/., 1986). It is thought that these sequences, 
referred to as regions of homology, are involved in regulating the synthesis ofp MHV mRNAs 
(Armstrong et al., 1984; Spaan et al., 1983). 

The AUG codon at position 31 initiates an open reading frame (ORF) of 3705 bases encoding 
a polypeptide of 1235 amino acids with a predicted mol. wt. of 136600. This ORF ends with a 
single UGA termination codon. The sequence context of the initiating codon, AAACAUGC, is 
frequently found amongst functional eukaryotic initiator sequences (Kozak, 1983). 

A number of structural features of the S propolypeptide are noteworthy. Firstly, within the 
MHV°SS propolypeptide sequence there are 21 potential N-glycosylation sites of the type Asn—X- 
Thr/Ser (assuming that X is not Pro) (Fig. 2). The distribution of these sites is also shown in Fig. 
3. It is clear that at least one cluster of potential glycosylation sites occurs in the carboxy- 
terminal region of the polypeptide, between amino acids 1092 and 1158. Secondly, a 
hydropathicity plot of the amino acid sequence of the S propolypeptide, determined using the 
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cvenanvisuascaszcusse 


CTTGTAGTTTAAATCTAATCTAATCTAAACATGCTGTTCGTCTTTATTTTACTATTACCCTCTTGTITAGGGTATATTGGTGATTTTAGA 
Met LeuPheVal PheI leLeuLeuLeuProSerCysLeuGlyTyrIleGlyAspPhearg 
MULFvFIuLLePS CLGYIGDEFR 


TGTATCCAGACCGTGAATTATAACGGCAATAATGCTT CTGCGCCTAGCATTAGCACCGAAGCAGTCGATGTTTCCAAAGGTCGGGGCACT 
CysIleGlnThrvalaAsnTyrAsnGlyAsnAsnAlaSerAlaProSerlleSerThrGluAlavalAspValSerLysGlyArgGlyThr 
c I QT Vv N ¥Y N GNN 7 S A P S$ I S T EA VoDV S$ K GR G T 
TACTATGTTTTAGATCGTGTTTACTTAAATGCCACGTTATTGCTTACTGGTTATTATCCTGTGGACGGTTCCAATTATCGGAATCTCGCG 
TyrTyrValLeuAspArgvalTyrLeuAsnAlaThrLeuLeuLeuThrGlyTyrTyrProValAspGlySerAsnTy rArgAsnLeuAla 
Y ¥ V L DRVY¥ UN é T L LLere|e ¥ ¥ Pv DbG Ss N ¥ RNLULA 


CTTACAGGCACTAATACCTTAAGCCTTACGTGGTTTAAACCACCCTTTCTAAGTGAGTTTAATGATGGTATATTTGCTAAGGTCCAGAAC 
LeuThr6lyThrAsnThrLeuSerLeuThrTrpPheLysProProPheLeuSerGluPheAsnAspGlyllePheAlaLysValGlnAsn 
LT GT N T DS Lb Tf W F K P P F LS E F N DGI FAK VQN 


CTCAAGACAAATACGCCAACAGGTGCAACCTCATATTTTCCCACTATAGTTATAGGTAGTTTGTTTGGTAACACTTCCTATACCGTAGTT 
LeuLysThrAsnThrProThrGlyAlaThrSerTyrPheProThrilevallleGlyserLeuPheGlyAsnThrserTyrThrvaival 
LK T N T P TFT GAT S ¥Y F P TT TV tT G6 S&S LF GN e s ¥ T V V 
TTAGAGCCATATAATAATATTATAATGGCTTCTGTTTGTACATATACCATTTGTCAATTACCTTACACACCCTGTAAGCCTAATACCAAT 
LeuGluProTyrAsnAsnileIleMetAlaSerValCysThrTyrThri lecysGlnLeuProTyrThrProCyslysProAsnThrAsn 
LE P ¥ N N I TM AS VC T ¥ T IC Q4eP YY T PC K PN T ON 


GGTAATCGTGTTATTGGATTTTGGCACACAGATGTCAAACCGCCGATTTGTCTTTTAAAGCGTAATTTTACGTTTAATGTTAATGCCCCT 
GlyAsnArgValIleGlyPheTrpHisThrAspValLysProProllecysLeuLeuLysArgAsnPheThrPheAsnValAsnAlaPro 
G N R V I G F WH TODV K PPT eC Lb ob K RN e T F N V N A P 
TGGCTTTATTTCCATTTTTATCAGCAGGGTGGTACTTTTTATGCGTACTATGCGGATAAACCTTCCGCTACTACGTTTTTGTTTAGTGTG 
TrpLeuTyrPheHisPheTyrGlnGlnGlyGlyThrPheTyrAlaTyrTyrAlaAspLysProSerAlaThrThrPheLeuPheSerVal 
Ww oobY F H F ¥ Q Q GGT =F ¥ A ¥Y ¥Y A DK PS AT T F LF S V 
TATATTGGCGACATTTTAACACAGTATTTTGTGTTACCTTTTATTTGTACTCCAACAGCTGGTAGCACTTTAGCTCCGCTCTATTGGGTT 
TyrIleGlyAspIleLeuThrGlnTyrPheValLeuProPphel lecysThrProThrAlaGlySerThrLeuAlaProLeuTyrTrpVal 
Y I GDI LT Q ¥ FVLP F tfew?TtéPdgT?TAG Ss iT LA PLY WY 
ACACCTTTACTTAAGCGCCAATATTTGTTTAATTTTAATGAAAAGGGTGTCATTACTAGTGCTGTTGATTGCGCCAGCAGCTACATTAGT 
ThrProLeuLeuLysArgGlnTyrLeuPheAsnPheAsnGluLysGlyValtfleThrSerAlaValaAspCysAlaSerSerTyriIleSer 
T P iL b&bK R Q ¥ UW F N F N E K GViIiT Ss AVY DC AS S ¥ I S 


GAAATAAAATGTAAGACCCAAAGTCTCTTACCGAGTACTGGTGTCTATGATCTATCCGGTTACACGGTCCAACCTGTTGGAGTTGTGTAC 
GlulleLysCysLysThrGlnSerLeuLeuProSerThrGlyValTyrAspLeuSerGlyTyrThrvalGlnProvalGlyvalvaltyr 
BE «rk ¢ K TQ S$ LL PS T GY ¥ DLS CY TV QPY¥Y GY V Y¥ 


CGGCGTGTTCCTAACCTACCTGATTGTAAAATAGAGGAATGGCTCACTGCTAAATCTGTGCCGTCACCTCTCAATTGGGAGCGTAGGACT 
ArgArgValProAsnLeuProAspCysLysIleGluGluTrpLeuThrAlaLysServal ProSerProLeuAsnTrpGluaArgaArgThr 
R R V PN Lb PDC K IE E WL TT A K §S VP S PLN WE R RT 


TTCCAAAATTGTAATTTTAATTTAAGCAGCCTGCTACGTTATGTCCAGGCTGAGTCTTTGTCGTGTAATAATATTGATGCGTCCAAAGTG 
PheGInAsnCysAsnPheAsnLouSerSerLeuLeuArgTyrValGlnAlaGluSerLeuSerCysAsnAsni leAspAlaSerLysVal 
FQ N ¢ N F NLS S LLRY¥VQaA ES LS C NNI DAS KV 
TATGGTATGTGCTTTGGTAGTGTCTCAGTTGATAAGTTTGCTATCCCCCGAAGCCGTCAAATTGATTTACAAATTGGCAACTCCGGATTT 
TyrGlyMetCysPheGlyServalSerValAspLysPheAlalleProArgSerArgGlnIleAspLeuGlnileGlyAsnSerGlyPhe 
¥ GM C F G S$ V S V DK F AITPRS RQIDLQI GN S GF 


TTGCAAACGGCTAATTATAAGATTGATACCGCTGCCACATCATGTCAGCTGTATTACAGTCTTCCTAAGAATAATGTTACCATAAATAAC 
LeuGinThrAlaAsnTyrLysIleAspThralaAlaThrserCysGinLeuTyrTyrSerLeuProLysAsnAsnvaiThrileAsnaAsn 
LQT AN ¥ K I DT AA T S&S € QL Y ¥ S L PK NN VY T ITN WN 
TATAACCCCTCGTCTIGGAATAGGAGGTATGGTTTTAAAGTAAATGATCGCTGCCAAATTTTTGCTAACATATTGTTAAATGGCATTAAT 
TyrAsnProSerSecTrpAsnaArgArgTyrGlyPheLysValAsnAspargCysGlnI lePheAlaAsnI leLeuLeuAsnGlyIleAsn 
Y N P S S WN RR ¥ GF KV WN DR CQ IF AWN IT LEN GIN 
AGTGGGACTACGTGTTCCACAGATTTACAATTGCCTAATACTGAAGTGGCCACTGGCGTTTGCGTCAGATATGACCTCTATGGTATTACT 
SerGlyThrThrcysSerThrAspLeuGinLeuProAsnThrGluvalAlaThrGlyValCysvalargqTyrAspLeuTyrGlyIleThr 
§ G6 T T €¢ S$ T DL QU PN T E VAT GV CV R ¥ DU ¥ GIT 


GGTCAAGGTGTTTTTAAAGAGGT CAAGGCTGACTATTATAATAGCTGGCAGGCCCTATTATATGATGTTAATGGTAACTTAAACGGGTTC 
GlyGlnGlyVal PheLysGluvalLysAlaAspTyrTyrAsnSerTrpGlnAlaLeuLeuTyrAspValAsnGlyAsnLeuAsnGlyPhe 
GQ G V F K Ev K A DY ¥ N S WQaAULtLY DVN GN LN G F 


CGTGACCTTACCACTAACAAGACTTATACGATAAGGAGCTGTTATAGTGGCCGTGTTTCTGCTGCATATCATAAAGAAGCACCCGAACCG 
ArgAspLeuThrThraAsnLysThrtyrThrileArgSerCysTyrSerGlyArgvalSerAlaAlaTyrHisLysGluAlaProGluPro 
R D LTT N é T ¥ T © R S €C ¥ S$ GR VS A A ¥Y H K E A P E P 
GCTCTGCTCTATCGTAATATAAATTGTAGTTATGTTTTTACTAATAATATTTCCCGTGAGGAAAACCCCCTTAACTATITTGATAGTTAT 
AlaLeuLeuTyrArgAsnIleAsnCysSerTyrVal PheThrAsnAsnileSerArgGluGluAsnProLeuAsnTyrPheAspSerTyr 
A Lb L Y¥ R N IN e S ¥ V F T NWN e S R EB E N P LN ¥ F DS Y¥ 


TTGGGTTGTGTTGTTAATGCTGATAACCGCACGGATGAGGCGCTTCCTAATTGCAATCTCCGTATGGGTGCTGGACTATGCGTAGATTAT 
LeuGlyCcysValValAsnAl aAspAsnArgThrAspGluAlaLeuProAsnCysAsnLeuArgMetGlyAlaGlyLeucysValaAspTyr 
L GeV vy NA DN é T D £F A L PN CR DGD RMG AGL EV D YX 


TCAAAGTCACGCAGAGCCCGCCGATCAGTTTCTACTGGCTATCGATTAACCACATT CGAGCCATACATGCCGATGTTAGTCAATGATAGC 
SerLysSerArgArgAlaArgaArgSerValserThrGlyTyrArgLeuThrThrPheGluProTyrMet ProMet LeuValAsnAapSer 
Ss K S R R AR RS VS T GY RUT FT F FE FP XY BM PML VY BN é s 
GTTCAATECGTAGGTGGATTATATGAGATGCAAATACCAACCAATTTTACTATTGGTCATCATGAGGAATTCATCCAGATAAGGGCTCCC 
ValGlnSerValGlyGlyLeuTyrGluMetGlniIleProThrAsnPheThrIleGlyHisHisGluGluPhelleGlnileArgAlaPro 
vgs Vv GG@L*Y EMQqQitdrepgr nN é T I GHHeE EF I QtrRAP 
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Fig. 2. Nucleotide sequence of the MHV surface protein gene and the predicted amino acid sequence of 
the surface protein precursor. The amino-terminal signal sequence ("7"), the carboxy-terminal 
transmembrane domain (---), the charge cluster (**), the cysteine-rich region (©), potential 


Coronavirus MHV surface projection glycoprotein 


AAGGTGACTATAGATTGTGCTGCATTTGTTTGTGGTGATAACGCTGCATGCAGACAGCAGTTGGTTGAGTATGGCTCTTTTTGTGATAAT 
LysValThrIleAspCysAlaAlaPheValCysGlyAspAsnAlaAlaCysArgGlnGinLeuValGluTyrGlySerPheCysAspAsn 
K V T I DcAA FV¢CGDN AA CRQQULtUV EY GCG S F € DN 


GTTAATGCCATTCTTAATGAGGTTAATAACCTCTTGGATAATATGCAATTACAAGTTGCTAGTGCATTAATGCAGGGTGTTACTATAAGT 
valaAsnAlaIleLeuAsnGluValAsnAsnLeuLeuAspAsnMetGInLeuGinValAlaSerAlaLeuMetGinGlyValThrileser 
vVoN AITLN E VN NLEDNM QU eV AS AE KM QQ G VY Tf TS 


TCGAGGCTGCCAGATGGCATCTCCGGCCCTATAGATGACATTAATTTCAGTCCTCTACTTGGATGCATAGGTTCAACATGTGCTGAAGAC 
SerArgLeuProAspGlylI leSerGlyProI leAspAspI leAsnPheSerProLeuLeuGlyCysileGlySerThrCysAlaGluaAsp 
S$ R L P DGtI es G P I DDI 'N é Ss P LLGe¢egxr6 s&s fT € A E D 


GGCAATGGACCTAGTGCGATACGGGGGCGTTCAGCTATAGAGGATTTATTATTTGACAAGGTCAAACTATCTGACGTTGGCTTTGTCGAG 
GlyAsnGlyProSerAlalleArgGlyArgSerAlalleGluAspLeuLeuPheAspLysValLysLeuSerAspValGlyPhevValGlu 
GN GPS AIRGRsAtIEoDULUL*F DK VK BS DV GF VE 


GCTTATAACAATTGCACTGGTGGTCAAGAAGTTCGCGACCTCCTTTGCGTACAGTCTTTTAATGGCATCAAAGTATTACCTCCCGTGTTS 
AlaTyrAsnAsnCysThrGlyGlyGlnGluvaiArgAspLeuLeuCcysValGlnSerPheAsnGlyI leLysValLeuProProvValLeu 
A Y N ON e T GGQeEvVv R DLLuU¢v Qs FN GI K V & PPV 2 
TCTGAGAGTCAAATCTCTGGCTACACAGCGGGTGCTACTGCGGCAGCTATGTTCCCACCTTGGACTGCAGCTGCTGGTGTGCCATTCAGT 
SerGluSerGlnIleSerGlyTyrThrAlaGlyAlaThrAlaAlaAlaMet PheProProTrpThrAlaAlaAlaGlyValProPheser 
S ES QI 5S G ¥ T AGA T AA AM F P PWT AA AG VY P F S 


TTAAATGTTCAATATAGGATTAATGGTTTAGGTGT CACTATGAATGTTCTTAGTGAGAACCAAAAGATGATTGCTAGTGCTTTTAACAAC 
LeuAsnValGlnTyrArgIleAsnGlyLeuGlyValThrMetAsnValLeuSerGluAsnGlnLysMetIleAlaSerAlaPheAsnAsn 
L NV@QY RIN GLU_E|vT M NVLS EN QK MI AS A F NN 
GCGCTCGGTGCTATTCAGGAAGGGTTCGATGCAACCAATTCTGCTCTAGGTAAGATCCAGTCCGTTGTTAATGCAAACGCTGAAGCACTT 
AlaLeuGlyAlalleGlnGluGlyPheAspAlaThrAsnSerAlaLeuGlyLysIleGinSerValvalAsnAlaAsnAlaGiuAlaLeu 
A LGA I oQeEGFODAT N S&S ALUGK IQs VV N AN AE A L 
AATAATTTATTAAACCAACTTTCTAATAGGTTTGGTGCTATTAGTGCTTCTTTACAAGAAATTCTAACGCGGCTTGACGCTGTAGAAGCA 
AsnAsnLeuLeuAsnGlnLeuSerAsnArgPheGlyAlaI leSerAlaSerLeuGlnGlulleLeuThrArgLeuAspAlaValGluAla 
NN LLNOQUL Ss NR FGaAtIs AS LQeEIuLuLTrTRUDA VE A 


AAGGCCCAGATAGATCGTCTTATTAATGGCAGGTTAACTGCACTTAATGCGTATATATCCAAGCAACTCAGTGATAGTACGCTTATTAAA 
LysAlaGlnIleAspArgLeul leAsnGlyArgLeuThrAlaLeuAsnAlaTyrIleSerLysGinLeuSerAspSerThrLeulleLys 
K AQtI DRULtINGRULUTALNAY IS KQLSs DS TL I K 


TTITAGTGCTGCTCAGGCCATCGAAAAGGTCAATGAGTGCGTTAAGAGCCAAACTACGCGCATTAATTTCTGTGGCAATGGTAATCACATA 
PheSerAlaAlaGlnAlalleGluLysValAsnGlucysValLysSerGlnThrThrArglleAsnPheCysGlyAsnGlyAsnHisIle 
F S$ AA QA I EK VN ECVK SS QT T RIN F CGN GN H I 


TTATCACTTGTCCAGAATGCGCCTTATGGCTTATGTTTTATTCATTTCAGCTACGTGCCAACATCCTTTAAAACGGCAAATGTGAGTCCT 
LeuSerLeuValGlnAsnAlaProTyrGlyLeuCysPhelI leHisPheSerTyrValProThrSerPheLysThrAlaAsnValSerPro 
Ls LV QN AP ¥ GLC F IH F § ¥ V PT S F KT AN ° Ss P 


GGACTATGCATTTCTGGTGATAGAGGATTGGCACCTAAAGCTGGATATTTTGTT CAAGATAATGGAGAGTGGAAGTTCACAGGCAGTAAT 
GlyLeucysIleSerGlyAspArgGlyLeuAlaProLysAlaGlyTyrPheValGlnAspAsnGlyGluTrpLysPheThrGlySerAsn 
G Legis G@G DRGLA PP K AG Y¥Y FV QDN G EW K F T GS N 


TATTACTACCCTGAACCCATTACAGATAAAAATAGTGTTGCCATGAT CAGTTGCGCTGTGAATTACACAAAAGCGCCTGAAGTTTTCTTG 
TyrTyrTyrProGluProlleThrAspLysAsnSerValAlaMetIleSerCcysAlaValAsnTyrThrLysAlaProGluValPheLeu 
yY ¥ ¥ P E P I T DK N S VAM X¥ S C AV N é T K A P E V F L 


AACAACTCAATACCAAATCTACCCGACTTTAAGGAGGAGTTAGATAAATGGTTTAAGAATCAGACGTCTATTGCGCCTGATTTATCCCTC 
AsnAsnSerI1leProAsnLeuPr oAspPheLysGluGluLeuAspLysTrpPheLysAsnGlnThrSerIleAlaProAspLeuSerLeu 
N 3 Ss I P N L PDF K E€ E L DK WF KN QT S TAP DUS Lb 
GATTTCGAGAAGTTAAATGTTACTTTCCTGGACCTGACTTATGAGATGAACAGGATTCAGGATGCAATTAAGAAGTTAAATGAGAGCTAC 
AspPheGluLysLeuAsnValThr PheLeuAspLeuThrTyrGluMetAsnArgIleGlnAspAlalleLysLysLeuAsnGluSerTyr 
D F E K LN e T F L DLT ¥ EM NR IQ DA I K K LN é s Y¥ 


ATCAACCTCAAGGAAGTTGGCACATATGAAATGTATGTGAAATGGCCTIGGTATGTTTGGTTGCTAATTGGTTTAGCTGGTGTAGCTGTT 
TleAsnLeuLysGluvalGlyThrTyrGluMetTyrValLysTrpProTrpTyrValTrpLeuLeulleGlyLeuAlaGlyValAlaval 
InLbK EVGT ¥Y EMY VKWPWYVWOELEEIGLAGYAY 


TOTGTGTTATTATTCTTTATATGTTGCTGCACAGGTTGCGGCTCATGTTGTTTTAGAAAATGCGGAAGTTGTTGTGATGAGTATGGAGGA 
CysValLeuLeuPhePhe! leCysCysCysThrGlyCysGlySerCysCysPheArgLysCysGlySerCysCysAspGluTyrGlyGly 
¢ v LU F F I gce¢ce?t? GEG S$ EC EC F R K € G S ¢ DE ¥ GG 
5 Oc00Os, 6 8 FS 66 RES 56 
CACCAGGACAGTATTGTGATACATAATATTTCAGCCCATGAGGATTGACTATCACAGCCTCTCCTGGAAAGACAGAAAATCTAAACAATT 
HisGlnAspSerIleValilleHisAsnileSerAlaHisGluAspEnd 

HQ DS I ¥ I H WN é S A #H E BD * 


glycosylation sites (@) and the putative proteolytic cleavage site (——) are indicated. 
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procedure of Kyte & Doolittle (1982), reveals two regions of striking hydrophobicity (Fig. 3). At 
the amino terminus, the initiator methionine is followed by nine non-polar amino acids and this 
hydrophobic core precedes a number of small neutral residues (e.g. Ser-11 Gly-14) which are 
characteristically found at the signal peptidase recognition site (Fig. 2) (Von Heijne, 1984). At 
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Fig. 3. (a). Hydropathicity analysis of the S propolypeptide according to the method of Kyte & 
Doolittle (1982). The vertical scale is the average hydropathicity (+3 to — 3) for a frame of seven amino 
acids. The midpoint line represents the average hydropathicity of the 20 amino acids. Hydrophobic 
sequences appear above the base line. The putative signal (S) and anchor (A) domains, as well as the 
potential glycosylation sites (@) are shown. (b) Charge analysis of the MHV S propolypeptide. The 
analysis sums charge values over a window of nine amino acids. The putative cleavage site is indicated 


the carboxy terminus of the propolypeptide a sequence of 34 predominantly hydrophobic and 
neutral amino acids is followed by two positively charged amino acids (Arg-1209, Lys-1210), 
which are situated 25 amino acids from the carboxy terminus. Moreover, these ‘charge cluster’ 
residues are flanked by an unusual distribution of cysteine residues, which constitute 50% of the 
residues between positions 1198 to 1215. Not only an unusual distribution, but also a specific 
sequence, Cys~Gly—Ser-Cys—Cys, is found on either side of the ‘charge cluster’ (Fig. 2). Finally, 
the distribution of charged amino acids in the S propolypeptide (Fig. 3) indicates a particularly 
striking domain of basic residues, Arg—Arg—Ala—Arg—Arg, at positions 624 to 628. 


DISCUSSION 


The DNA sequences presented in Fig. 2 encompass the entire ‘unique’ region of MHV-JHM 
mRNA 3. In common with the mRNAs for the virion proteins, N and M, the S’ unique region of 
mRNA 3 is used to encode a single polypeptide, the precursor to the surface protein. In each of the 
mRNAs encoding MHV virion structural proteins, translation is initiated at the first AUG 
codon within the mRNA and the expressed ORF does not overlap with any downstream ORF 
present in the 3’ co-terminal sequences. In mRNAs 3, 6 and 7 the initiating codon is found ina 
preferred context and follows closely (1, 4 and 8 bases respectively) the ‘region of homology’ 
sequences which define the 5’ ends of the mRNA bodies (Skinner & Siddell, 1983; Pfleiderer et 
al., 1986). These data further support the model of a non-overlapping translation strategy for 
coronavirus gene expression, at least for the virion structural proteins (Siddell, 1986; Brown et 
al., 1986). 

The predicted amino acid sequence of the S propolypeptide, derived from the DNA sequence, 
reveals several interesting features. The predicted mol. wt. of the S propolypeptide is 136600. 
This agrees approximately with the size of the polypeptide found in tunicamycin-treated cells or 
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in vitro translation (Siddell, 1983). If the average mol. wt. of mannose-rich viral glycoprotein 
carbohydrate side chains is assumed to be 2000 to 3000, it seems likely that a considerable 
number of the 21 potential glycosylation sites are utilized, in order to account for the 
approximately 40000 to 50000 apparent mol. wt. difference between the glycosylated and non- 
glycosylated S polypeptides (Siddell, 1982). 

The hydropathicity analysis indicates that the MHV S polypeptide belongs to the group 1 
membrane proteins (Garoff, 1985). These proteins are inserted across the endoplasmic 
reticulum membrane starting from their amino terminus using the same mechanism as secretory 
proteins. They therefore carry a typical amino-terminal hydrophobic signal sequence which is 
not present in the mature protein. A putative signal sequence is present in the predicted MHVS 
propolypeptide. The small neutral glycine residue at position 14 appears to be a possible signal 
peptidase cleavage site (Von Heijne, 1984). 

A second hydrophobic region at the carboxy terminus of the MHV S propolypeptide is 
characteristic of a transmembrane domain and is delineated from the hydrophilic cytoplasmic 
domain by the positively charged Arg/Lys residues at positions 1209/1210. Garoff and 
colleagues (Cutler & Garoff 1986; Cutler et a/., 1986) have tested the postulate that the 
hydrophobic transmembrane domain, together with the ‘charge cluster’ of the cytoplasmic 
domain make up a membrane binding region of group | proteins, which acts as a ‘stop transfer 
signal’ to arrest translocation and prevent secretion. Their results, however, indicate that the 
‘charge cluster’ is necessary only for stabilization of the protein-membrane interaction and the 
hydrophobic stretch alone is able to arrest translocation. Surrounding the charge cluster in the 
MHYV S polypeptide are an unusual number of cysteine residues, which occur in a specific 
sequence context. It seems possible that these sequences are involved in the acylation of the S, 
polypeptide, because acylation of the vesicular stomatitis virus G protein is believed to occur at 
cysteine residues in the vicinity of the hydrophobic transmembrane domain (Rose et al., 1984; 
McGee et al., 1984). 

During virus maturation the MHV S polypeptide is cleaved by a host cell enzyme to yield the 
S, and S, polypeptides. The charge analysis of the MHV S polypeptide reveals a sequence Arg— 
Arg-Ala—Arg—Arg at positions 624 to 628 which is very similar to a number of basic sequences 
involved in the cleavage of several other enveloped, RNA virus glycoproteins (White et al., 
1983). Sturman & Holmes (1977) have shown that the MHV-AS9 S polypeptide can be cleaved 
by trypsin and it appears likely that coronaviruses, in common with many enveloped RNA 
viruses, utilize a cellular trypsin-like endoprotease activity to achieve proteolytic processing. It 
is not yet known whether a second, carboxypeptidase, enzyme plays a role in the maturation of 
the coronavirus S protein, as has been shown for the influenza virus haemagglutinin (Garten & 
Klenk, 1983). 

Following cleavage the fusion properties of the MHV S protein are activated (Sturman et al., 
1985). Examination of the MHV S polypeptide sequence shows that cleavage at the site 
mentioned above would not result in a strongly hydrophobic domain at the amino terminus of 
S,. This would contrast with the fusogenic myxovirus proteins, where the hydrophobicity of the 
amino terminus of, for example, the influenza virus HA, protein is essential to the fusion 
process. Possibly, the mechanism of coronavirus-induced cell fusion involves either a less 
hydrophobic amino-terminal domain on S, or other, as yet unidentified, hydrophobic domains 
on the S protein. 

Finally, it is of interest to compare the nucleic acid and predicted amino acid sequences of the 
MHV S protein gene reported here with the recently determined sequence of the avian 
infectious bronchitis coronavirus (IBV) S protein gene (Binns et al., 1985). At the nucleic acid 
level the sequences appear to be essentially unrelated. However, a comparison of the predicted 
amino acid sequences using the dot matrix program DIAGON (Staden, 19825), which looks not 
only for identical residues, but also for residues with similar properties, reveals a striking degree 
of similarity in the S, polypeptide, but little similarity in the S, region (Fig. 4). To some extent 
the similarities in the S, region represent recognizable features, for example, the transmem- 
brane domain, the cysteine-rich region, or the putative cleavage site. Additionally however, 
there are regions of amino acids with similar properties, and also specific sequences of identical 
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Fig. 4. DIAGON analysis (Staden, 19825) of the homologies between the MHV-JHM S propolypeptide 
amino acid sequence and the IBV Beaudette S propolypeptide sequences (Binns et al., 1985). Points 
represent matches at the 1% significance level. The putative MHV cleavage site as well as the known 
cleavage site (Cavanagh et al., 1986) are indicated (1). 


amino acids (for example, the sequence Trp-Pro-Trp-Tyr—Val-Trp—Leu, positions 1184 to 
1191 for MHV; 1092 to 1099 for IBV), the significance of which remains to be determined. 

The cloning and sequencing of the MHV S protein gene is an important step in studies on the 
molecular biology of MHV and especially the interaction between the virus, the host cell and the 
immune system during infection. Experiments to investigate these interactions at the molecular 
level can now be undertaken. 
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